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1 Introduction 


During this reporting period we have worked on three somewhat different 
problems. These are modeling of video traffic in packet networks, low rate 
video compression and the development of a lossy + lossless image compres- 
sion algorithm, which might have some application in browsing algorithms. 
The lossy + lossless scheme is an extension of work previously done under 
this grant, it provides a simple technique for incorporating browsing capa- 
bility. The low rate coding scheme is also a simple variation on the standard 
DCT coding approach. In spite of its simplicity the approach provides sur- 
prisingly high quality reconstructions. The modeling approach is borrowed 
from the speech recognition literature, and seems to be promising in that it 
provides a simple way of obtaining an idea, about the second order behav- 
ior of a particular coding scheme. Details about these are presented in the 
following sections. 

2 Lossy+Lossless Compression 

Lossless compression of images consist of two steps; a decorrelation step in 
which the redundancies within the image are exploited to reduce the first 
order entropy of the image, and a coding step in which variable length codes 
are used to provide coding rates close to the entropy. The second step has 
been very well studied with the development of coding scheme for sources 
with known statistics, such as the Huffman codes [1] and Arithmetic codes. 
More recently universal coding schemes such as the Rice algorithm [2] have 
been developed for coding sources with unknown statistics. The problem 
of decorrelation is not that well studied, and to date the best decorrelation 
strategies have been predictive techniques. 

The recently proposed JPEG still compression standard [3] uses predictive 
techniques to decorrelate the image. It provides eight different predictive 
schemes from which the user can select. Table 1 lists the eight predictors. 
The first scheme makes no prediction. The next three are one-dimensional 
predictors and the last four are two-dimensional prediction schemes. 

Sayood and Anderson [4] propose a switched prediction scheme which has 
static ordering and replacement functions and a backward adaptive neigh- 
borhood function. They scan the image in raster order, predicting the value 
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Mode 

Prediction for P[i,j] 

0 

0 (No Prediction) 

1 

P[i ~ Ej] 

2 

P[iJ~ 1] 

3 

P[i~hj~ 1] 

4 

P[i,j ~ 1] + P[i,j - 1] “ P[i ~ !-i - !] 

5 

P[i,j - 1] + (P[i,j - 1] - P[i - 1 J ~ l])/2 

6 

P[i - l ,;] + (P[i - l,i] - P[i -hi- i])/2 

7 

{P[i,j - 1] + P[i - l,i])/2 


Table 1: JPEG Predictors for lossless coding 


of the current pixel by using a reference pixel (say, the left neighbor). If the 
prediction error exceeds a certain threshold then the reference pixel for the 
next pixel is switched (say, to the top neighbor). The scheme is very simple 
and can be implemented efficiently in hardware. 

Another simple but perhaps more effective technique, named MAP (Me- 
dian Adaptive Prediction), is given by Martucci [5]. Here, the median of a 
set of predictions is chosen as the prediction that is used to form the pre- 
diction error. Simulations were reported using the median of the following 
three predictors 

1. P[i,j- 1] 

2. P[i- l,j] 

3. P[iJ - 1] + P[i ~ 1 J - 1] - P[i - 1 J ~ 1] 

Results obtained were an improvement over any of the three predictors taken 
individually. The reason for this is that the median adaptive predictor would 
always choose either the best or the second best predictor among the candi- 
date predictors. 

Given the success of lossy image compression techniques at generating an 
excellent visual approximation of an image at very low bit rates, the following 
scheme seems a natural candidate: 

• First generate a low bit rate representation of the image by some lossy 
technique. 
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• Use this low bit rate approximation to clecorrelate the image by forming 
a residual which represents the difference between the original image 
and the low rate approximation. 

Such schemes are called Lossy plus Lossless schemes . 

In order to reconstruct the image from the residual, the receiver would 
need to first have the low rate approximation. So one can see that in effect we 
have a decorrelation technique with a forward adaptive replacement function. 
Techniques that use a discrete cosine transform based lossy step have been 
investigated in [6]. Using the Walsh-Hadamard transform and S-transform 
in the lossy step was investigated in [7]. Also, investigated in the same study 
was Subband Coding using the Smith and Barnwell Filter as well as the 
Quadrature Mirror Filter for getting a low bit rate approximation of the 
image. Manohar and Tilton [8] give a Vector Quantization based lossy plus 
lossless technique. They get improved performance by iterating this process 
again on the residual by using a special codebook for the residual image. 
They report best performance for three such iterations. 

One advantage of lossy plus lossless techniques is that they provide the 
user with a low rate approximation of an image, based on which the decision 
for viewing the exact image can be made. This is generally called browsing 
capability. It finds applications in situations where the user may have to 
scan through a large database of images in order to find a specific image of 
interest. The disadvantage is that the final bit-rate is generally higher than 
the bit rate that would have been obtained if the image had been losslessly 
coded directly, instead of first going through the lossy encoding step [9]. 

Although a. variety of schemes exist for image decorrelation, very few com- 
parative studies have been reported in literature. A performance comparison 
of some of the schemes listed above is given in [7] for medical images. It 
was concluded in this study that the HINT scheme was more effective then 
the other schemes studied. However, in [9], it was observed that predictive 
techniques out perform other techniques. 

In this work, we shall take the JPEG still image compression standard 
[3] as a basis for comparison. We do so because our experience has shown 
the scheme to be quite robust and yields superior performance over a wide 
range of images. We have chosen a set of test images (given in appendix 1) 
on which the eight different predictors listed by JPEG were tried. Table 2 
lists the entropy of the residual image tor the test images. 
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Image 

JPEG 

0 

JPEG 

1 

JPEG 

9 

JPEG 

3 

JPEG 

4 

JPEG 

5 

JPEG 

6 

JPEG 

7 

USC-Girl 

6.42 

5.05 

5.10 

5.40 

5.07 

4.90 

4.93 

4.82 

Girl 

6.49 

4.25 

4.37 

4.65 

4.6S 

4.6S 

4.75 

4.53 

Lady 

5.37 

3.83 

4.16 

4.31 

4.09 

3.81 

4.02 

3.84 

House 

6.54 

4.64 

5.06 

5.35 

4.58 

4.46 

4.64 

4.64 

Couple 

5.96 

4.67 

4.49 

5.11 

4.36 

4.38 

4.27 

4.41 

Tree 

7.21 

5.63 

5.93 

6.04 

5.74 

5.49 

5.66 

5.51 

Satellite 

7.31 

6.15 

6.39 

6.55 

6.09 

5.90 

6.01 

5.89 


Table 2: Entropy of error image using JPEG predictors 


In our technique we used a lossy compression scheme developed under a 
grant from the Goddard Space Flight Center (NAG 5-916). The details of 
the lossy scheme is described in a, recently published paper [10], a copy of 
which is included. The heart of this scheme is a recursively indexed quantizer 
[4] which maps a large (possibly countably infinite) set into a small finite set. 
This means that the entropy coding that is to be performed can be done 
on a small alphabet, which can result in substantial savings in hardware 
complexity. In our implementation the size of the output alphabet varied 
from three to nine. This can be contrasted with the JPEG lossless scheme 
where the size of the input alphabet of the entropy coder (output alphabet 
of the decorrelation scheme) is 511 (this could be reduced to 256 by being 
somewhat clever about how to store the residuals). 

The scheme works as follows: the Edge Preserving DPCM (EPDPCM) 
scheme is first used to encode the image at some required fidelity level. If 
a lossless version is then required the difference between the reconstructed 
image and the original is then transmitted to the receiver. One of the attrac- 
tive features of the EPDPCM scheme is that the reconstruction error can be 
strictly limited to within a predetermined limit. Thus, we could encode the 
image so that the error is confined to the least significant bit, or the least 
m significant bits. This makes the lossless step very simple. Depending on 
the fidelity of the lossy step, we could use m bits, without the need for any 
further entropy coding. 

We tried a variety of predictors in the EPDPCM scheme. The two that 
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A = 2 

A = 4 

CO 

11 

<1 

Image 

Lossy 

Lossless 

Total 

Lossy 

Lossless 

Total 

Lossy 

Lossless 

Total 

USC-Girl 

3.93 

i 

4.93 

2.84 

2 

4.84 

1.95 

3 

4.95 

Girl 

3.91 

i 

4.91 

2.84 

2 

4.84 

1.97 

3 

4.97 

Lady 

3.14 

i 

4.14 

2.17 

2 

4.17 

1.52 

3 

4.52 

House 

3.75 

i 

4.75 

2.74 

2 

4.74 

1.92 

3 

4.92 

USC-Couple 

3.61 

i 

4.61 

2.57 

2 

4.57 

1.73 

3 

4.73 

Tree 

5.22 

i 

6.22 

3.74 

2 

5.74 

2.79 

3 

5.79 

Satellite 

5.49 

i 

6.49 

3.95 

2 

5.95 

2.94 

3 

5.94 


Table 3: Rates for the Lossy + Lossless Scheme Using the Harrison Predictor 


gave the best results were a. predictor due to Harrison [11], and variation of 
the MAP predictor [5]. The Harrison predictor is of the form | P{iJ - 1) + 

| P(j - 1, j) - ±P(i — 1, j - 1). The results are shown in Table 3. 

In these simulations we used a nine level recursively indexed quantizer. 
The best results seem to occur for a step size (A) of four. The final lossless 
performance is within .3 bits of the best JPEG lossless scheme in each case. 
For the USC-Girl and Satellite images, the lossy+lossless scheme actually 
performs as well as the JPEG schemes. 

We also simulated the median adaptive predictor with one slight modi- 
fication. In the published form the MAP has infinite memory. This makes 
it unsuitable for use in lossy schemes, as the quantization error will tend to 
propagate. We therefore multiplied the prediction with a prediction coeffi- 
cient of 0.85. This makes the predictor leaky and allows the effect of the 
errors to die out over time. The results are presented in Table 4 

As the MAP predictions are somewhat better than the predictions from 
the Harrison predictor we used a three level recursively indexed quantizer 
for all but the Tree image. The best results in these simulations seem to be 
obtained when A has a value of two( except for the Satellite image). Notice 
that in this case the performance for some of the images is actually better 
than the performance of the JPEG lossless scheme. 

We have presented a simple Lossy + Lossless compression scheme which 
compares favorably with existing schemes in terms of the bit rate. However, 
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A = 2 

A = 4 

A = 8 


Image 

Lossy 

Lossless 

Total 

Lossy 

Lossless 

Total 

Lossy 

Lossless 

Total 

USC-Girl 

3.51 

i H 

4.51 

2.60 

2 

4.60 

1.93 

3 

4.93 

Girl 

3.41 

i 

4.41 

2.70 

2 

4.70 

2.12 

3 

5.12 

Lady 

3.38 

i 

4.381 

2.69 

2 

4.69 

2.06 

3 

5.06 

House 

3.60 

i 

4.60 

2.81 

2 

4.81 

2.14 

3 

5.14 

USC- Couple 

3.40 

i 

4.40 

2.49 

2 

4.49 

1.79 

3 

4.79 

Tree 

5.67 

i 

6.67 

4.47 

2 

6.47 

3.36 

3 

6.36 

Satellite 

5.26 

i 

6.26 

3.82 

2 

5.82 

2.74 

3 

5.74 


Table 4: Rates for the Lossy + Lossless Scheme Using the Median Adaptive 
Predictor 


to be truly competitive, the first lossy pass should have a significantly lower- 
bit rate, to accomodate quick previews. This could be done by subsampling 
the image first and providing a coded version of the subsampled image to the 
user. We are currently working on this problem. 


3 Low Rate Video Coding 

Xiaomei Wang 

Transform coding is a widely accepted method for image and video com- 
pression. The basic motivation behind transform coding is to remove the 
source redundancy by decomposing the input signal into components in the 
frequency or transform domain, i.e. translate a set of data into another set 
of less correlated or more independent coefficients. Of particular interest to 
image processing and image coding standards is the two dimensional dis- 
crete cosine transform. The DCT provides a good match to the optimum 
(covariance-diagonalizing or Karhunen-Loeve) transform for most image sig- 
nals and fast algorithm exist for computing the DCT. 

Traditionally as well as for convenience it is assumed that all of the im- 
portant coefficients are packed into a specific area of the transform domain, 
this is called the ’’energy compaction” effect of the cosine transform. The 
amount of compression depends upon the number of coefficients retained in 
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0 

-8 
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Table 5: Block of difference image 


this area. Usually the low frequency area is considered more important than 
the high frequency area. For this reason image data is often compressed by 
coding and then transmitting only the low-frequency components. But this 
assumption is not always true. 

Another possibility is to put a threshold on the transformed coefficient 
magnitude and set all coefficients with magnitudes below the threshold to 
zero. This approach is more realistic because we do not assume any fixed 
important area, but consider this area dynamic, depending on the character- 
istics of images. This is a more complex coding strategy but it results in 
a very high compression rate while maintaining better picture quality, i.e. 
more details and less block effect compared to coding and transmitting only 
the low frequency components. 

In the following we describe a threshold transform coding scheme which 
incorporates DPCM and runlength coding. The motion picture sequence 
used for testing is that of a woman talking on the phone. This is one of the 
standard sequences used by the MPEG committee. 

In Figure 1 we show one of the frames from this sequence. The difference 
image between the current frame and the quantized version of last frame is 
shown in Figure 2. We will divide this image into N by N sub-blocks and 
process each block separately. Let us take a look at a randomly chosen 8xb 
block shown in Table 5. 

After the cosine transform the coefficients are shown in Table 6 

We can see that the there is no obvious energy compaction for this block 
and comparatively larger values are scattered around the block. We can see 




-3.0 

-0.1 

0 0 

1.0 

0.0 

-1.4 

O.S 

0.2 

-0.3 

0.5 

-0.2 

0.3 

-2.9 

-0.2 

-0.8 

0.0 

0.4 

0.4 

4.0 

-0.1 

-0.1 

-0.5 

-2.7 

-0.3 

0.3 

-0.0 

1.7 

-0.1 

-0.8 

0.6 

-2.1 

-1.4 

-1.3 

1.2 

-1.3 

1.0 

-2.1 

-0.0 

-1.3 

0.4 

0.8 

-0.9 

-1.0 

1.3 

-3.1 

1.2 

0.1 

0.6 

-1.3 

-1.2 

-0.2 

-2.9 

0.0 

0.2 

-0.2 

-1.5 

2.3 

-0.7 

1.2 

1.1 

-1.4 

0.9 

0.1 

1.4 


Table 6: DCT coefficients 


-3.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

4.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

-3.1 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 


Table 7: Coefficients with threshold = 3 


this more clearly by using the threshold strategy. We choose a threshold t as 
a positive number and compare each coefficient in the transform domain with 
the threshold. If the magnitude of the coefficient is less than the threshold, 
we set it to zero; if it is larger than or equal to the threshold, we retain it 
without change. 

With t = 3, we get the block in Table 7. 

We can see there are only three non-zero coefficients left, not all of them 
are in the low frequency area. However, in spite of the fact that we have only 
three of the original sixty four coefficients left we will see that after the inverse 
transform we can still get a very good looking picture. The image in Figure 3 
was reconstructed after zeroing out all coefficients with a magnitude less than 
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three. As can be seen this is a very good reproduction of the original image 
with only ^ of the original data.. The reconstructed image with threshold = 

5 is shown in Figure 4. 

The setting of the threshold depends upon the compression as well as 
the picture quality we need. The amount of compression can be somehow 
indicated by the number of non-zero coefficients left. The number of non-zero 
coefficients as a function of the threshold is shown in Figure 5. As to the 
picture quality, after doing the inverse cosine transform and DPCM decoding 
we compare the image with the original and compute the PSNR. The PSNR 
as a function of the threshold is shown in Figure 6. 

To transfer the block of coefficients we first linearize the two dimensional 
block along a zig-zag scanning path as shown in Figure 7 and get the sequence 
of data containing almost all zeros except for a few non-zero data. We will 
use run length coding and Huffman code for such data. 

The non-zero coefficients are quantized. The quantizer is designed based 
on the probability distribution of data. We show the relative distribution 
of frame 25 in Figure 8. Here we choose threshold t = 5, so the center of 
x-coordinate is T5 or -5. The distribution of data from the other frames is 
similar. 

We design our quantizer based on this distribution however as there are 
variations from frame to frame, we choose to have more output levels in 
case these are needed in some other (rarnes. Since we use Huffman code for 
the quantizer outputs, the bit rate will not increase much because of more 
quantizer outputs. The Huffman code we use for non- zero coefficients is in 
Table 8. 

The relative distribution of runs and the Huffman code designed for the 
runs is shown in Table 9 

In order to reduce the number of bits we will not count the last zero run 
of each block. In order to do this we need a symbol for either the beginning 
or the end of the block. We will count the number of runs of each block and 
send it as the header of each block. The relative probability distribution of 
number of runs of each block is shown in table 10, again we use an entropy 
code for coding efficiency. 

Because the distribution of each frame varies, there is no need to design 
Huffman code exactly according to one frame. The code in the table is 
comparatively easy and efficient. 

Using the above methods the final bit rate for frame 25 is about 0.24 
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Number of Outputs 

Code 


0 

63340 

No Code 

1 

1272 

i 

2 

763 

01 

3 

115 

001 

4 " 

42 

0001 

5 

4 

00001 

6 

0 

000001 

7 

0 

0000001 

S 

0 

00000001 


Table 8: Quantizer outputs statistics and Huffman codes 


Run Length 

Number of Runs 

Code 

0 

233.000000 

1110 Q 15) 

1 

1651.000000 

0 

2 

477.000000 

110 

3 

1 S 4. 000000 

1001 

4 

151.000000 

1000 

5 

120.000000 

10111 

6 

94.000000 

10101 

7 

77.000000 

mm 

8 

73.000000 

111110 

9 

62.000000 

111101 

■EMM 

50.000000 

101001 

11 

61.000000 

111100 

12 

52.000000 

101100 

13 

44.000000 

101000 

14 

35.000000 

1011011 

15 

18.000000 

1011010 


Table 9: Relative distribution of runs and Huffman codes 
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Number of Coefficients 

Occurrence 

Code 

0 

445 

1 

1 

163 

01 

2 

106 

001 

4 

S3 

0001 

5 

70 

00001 

5 1 

57 

000001 

6 

37 

0000001 

7 

30 

00000001 

8 

11 

000000001 

9 

12 

0000000001 

10 

3 

00000000001 

11 

4 

000000000001 

12 

2 

0000000000001 

13 

0 

00000000000001 

14 

1 

000000000000001 


Table 10: Distribution of number of coefficients in each block and a trivial 
entropy code 
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(bits/pixel). The reconstructed image after coding is shown in Figure 9. 
Using the same code for other frames the bit rate varies a little but not too 
much. 

We have described a simple easy to implement low rate video coding 
scheme. To keep the algorithm simple we have not used any motion com- 
pensation or more complicated quantization techniques. The thresholding 
operation can be made simpler if we chose the threshold to be a power of 
two. In this case the thresholding would simply consist of shifting the least 
significant bits out. 


4 Using Hidden Markov Model as Video Source 
Output Model 

Yun- Chung Chen 

4.1 Introduction 

Variable bit rate coding scheme will be implemented in ATM networks in or- 
der to obtain flexibility and efficiency. This is important because the output 
bit rate stream of a video source depends on the specific scene contents and 
coding scheme used. Also, different types of video sources will have different 
statistical characteristics, and different bit rates. Thus it would be inefficient 
to use fixed rate coding schemes. In this project we examine the CCITT 
H.261 coding scheme which is a proposed standard for video- telephony or 
single- activity motion scenes. We are interested in how to transmit the coded 
video information across ATM networks efficiently. Performance simulations 
are very important when designing a coding scheme which will hopefully best 
fit into the future ATM environment. Efficient and accurate simulations de- 
pend on accurate modeling. Unfortunately, the modeling of video sources is 
more complicated than the voice source model like Modulated Markov Pois- 
son Process(MMPP) [12]. The use of continuous state autoregressive pro- 
cesses used for video source modeling usually generates significant difficulties 
in any analytical analysis. Maglaris et. al. [13] develop a discrete state, con- 
tinuous time Markov process to simplify the analysis. In this project, we use 
the concept of Hidden Markov Models(HMM) to simulate the variable output 
rate of a video source. Whenever the analytical analysis is impossible, we 
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hope to get some insight about the performance from the simulation. Even 
when the analytical analysis is possible, we can have a comparison tool. We 
show that the HMM as a video source output model can accurately reflect 
atleast the second order video output statistics. 

4.2 Problem Setup 

An HMM is characterized by the following: 

1. N, the number of states in the model. 

2. M, the number of distinct observation symbols per state. 

3. A = {a,j}, the state transition probability distribution. 

4. B = {6j(fc)}, the observation symbol probability distribution. 

5. 7r = { 7T,' 1 , the initial state distribution. 

Most video source models developed have used the frame as a unit when 
modeling the output sequence [13]. Considering the data structure used in 
the H.261 coding algorithm, we decided to use a macroblock (16 x 16 pixels) 
as our unit simply because the coding algorithm adopts different quantization 
strategies for every macroblock. H.261 changes the step size of quantizer 
depending on the buffer fullness. If the buffer is full, coarse quantization 
will produce less output and release the tight condition of buffer. Using the 
quantization mode as the state in the HMM seems to be a natural choice. We 
hope this choice can accurately reflect the bit rate distribution in different 
quantization modes. As a test sequence we used the Susie sequence, which is 
one of the standards from MPEG. We developed an H.261 simulator and used 
frames 46-55 of the Susie sequence. The choice of these 10 frames is based 
on the consideration of covering the states, symbols and state transitions 
adequately. Using these ten frames gave us 2560 output bit rates. Through 
the coding simulation of these ten frames, the quantization step travels back 
and forth in the set:(S,16,24,32,40,48,56). Each quantization step denotes a 
state in our model, therefore N equals 7. The coder generates a zero output 
when dealing with a motionless macroblock, and when the bufffer is full. So, 
0 is assigned as a distinct symbol. The observation symbols are the output 
of the quantizer. We use an eight level quantizer so M equals eight. 
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0.000000 

0.100000 
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0.000000 

0.000000 




0.100000 


0.000000 





0.800000 


0.000000 





0.200000 



Table 11: Initial condition A. opt for the transition probability matrix 
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0.142S57 

0.142857 

0.142857 

0.142857 

0.142857 

0.142S57 

0.142857 

0.142857 

0.142857 

0.142S57 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142S57 

0.142857 

0.142857 

0.142857] 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142S57 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 


Table 12: Initial condition A. uni for the transition probability matrix 


There are several possible ways of initializing the algorithm used for de- 
veloping the Hidden Markov Model. These depend on the selection of the 
initial state transition matrix A, the matrix of observation symbol proba- 
bility distributions B, and the initial state probability distribution 7r. The 
different initial values of these matrices used in this work are shown in Ta- 
ble 11 to Table 16. Eight different combinations of these initial parameters 
were used to run the optimization, *.uni is uniform distribution for A, B 
and 7 r matrix. A. opt is the approximate form we think the A matrix should 
take since one can only travel between neighboring states. B.opt is actually 
calculated from the H.261 simulation, therefore we can comfortably assume 
it is optimal. The initial state probability matrix 7r.opt is obviously correct 
since we start the simulation with an empty buffer. 
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0.160000 

0.240000 

0.080000 

0.120000 

0.080000 

0.040000 

0.040000 

0.240000 

0.155555 

0.133333 

0.266666 

0.222222 

0.133333 

0.044444 

0.022222 

0.022222 

0.187214 

0.109589 

0.420091 

0.178082 

0.0S2191 

0.013698 

0.004556 

0.004556 

0.277456 

0.057803 

0.421905 

0.173410 

0.052023 

0.005780 

0.005780 

0.005780 

0.223034 

0.113528 

0.4515531 

0.138939 

0.027422 

0.02010 T 

0.003656 

0.000000 

0.22576^ 

0.125427" 

0.461104 

0.114025 

0.041049^ 

0.022S05 

0.004561 

0.000000 

0.186943 

0.154302 

0.528189 

0.062314 

0.050445 

0.016320 

0.001483 

0.000000 


Table 13: Initial condition B.opt for the observation probability matrix 


0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.1250001 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.12500CP 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 

0.125000 


Table 14: Initial condition B.uni for the observation probability matrix 


1.000000 

0.000000 

0.000000 

0.000000 

0.000000 0.000000 

0.000000 


Table 15: Initial condition 7T.opt for the initial probability matrix 


0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 

0.142857 


Table 16: Initial condition 7r.opt for the initial probability matrix 
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0.971572 

0.028428 

0.000000 

0.000000 

0.000000 

0.000000 

0.000000 


0.000000 

0.723474 

0.221840 

0.006125 

0.047594 

0.000935 

0.000044 

0.000000 

0.074619 

0.880526 

0.044584 

0.000242 

0.000000 

0.000000 

0.000000 

0.004280 

0.108101 

0.869359 

0.000116 

0.001380 

0.016837 

0.000000 

0.000000 

0.000000 

0.2906S7 

0.705150 

0.003746 

0.000020 

0.000000 

0.000000 

0.000000 

0.000000 

0.0806S9 

0.858148 

0.061029 

0.000000 

Table 

0.000000 
17: A mat 

0.000000 
rix at 29 th 

0.000000 
iteration u 

0.000000 
sing A.opt 

0.095528 
B.opt and 

0.904163 

7T .Opt 


0.272096 

0.255770 

0.106084 

0.113691 

0.056S53 

0.000000 

0.024947 

0.170560 

0.009124 

0.000000 

0.0S5S11 

0.399S09 

0.325566 

0.129653 

0.040775 

0.009262 

0.008507 

0.129088 

0.760925 

0.101409 

0.000071 

0.000000 

0.000000 

0.000000 

0.897719 

0.101437 

0.000844 

0.000000 

0.000000 

0.000000 

0.000000 

0.000000 

0.014575 

0.638788 

0.346638 

0.000000 

0.000000 

0.000000 

0.000000 

0.000000 

0.000000 

0.026007 

0.436730 

0.361496 

0.174790 

0.000000 

0.000977 

0.000000 

0.021606 

0.154724 

0.814827 

0.000000 

0.000000 

0.000000 

0.008843 

0.000000 


Table 18: B matrix at 29th iteration using A.opt B.opt and Tr.opt 


4.3 Discussion 

Contrary to our expectations, using A.opt, B.opt and Tr.opt as initial pa- 
rameters didn’t generate the optimal solution. Instead, the combination of 
A. uni, B.opt and 7 r.uni produced the highest score P(o|A). This is not that 
surprising il we consider that A.opt actually is not optimal, borne sample 
values of A, B, and 7r are shown in Tables 17 through 22. 

Because of the length of the sequence (2560) used in the optimization pro- 
cedure, we ran into the problem of underflow. We used a scaling algorithm to 
correct most of the effects of the underflow. However, this was not sufficient, 
and some elements in the B and tt matrix sometimes got very small and 
went to zero during the simulation. This caused problems, as computation 
of the path metric requires the computation of logarithms of the probabili- 
ties. Therefore, whenever some number is got very small, we artificially set 
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1.000000 

0.000000 

0.000000 

0.000000 

0.000000 

0.000000 

0.000000 

Table 22: tt matrix at 136th iteration using A.uni, B.opt, and Tr.uni 


it to a constant(10~ 100 ). This probably would not affect the optimal path 
tracking because these small numbers are in the range of 10 100 and the paths 
using these small number would not be selected anyway. But it does affect 
the reestimation procedure slightly, in that the probability distribution won t 
sum up exactly to 1. 

The optimal B matrix we get is not very close to bmax.opt which is 
actually observed from the experiment. But it still reflects the fact that 
coarse quantization is more likely to produce small output. The aveiage 
value n over all 10 frames and the standard deviation a were found to be 
/x = 26.05 bits/macroblock and a = 22.98 bits/macroblock. Using HMM, we 
generate a output sequence with j.i = 26.44 bits/macroblock and a = 21.99 
bits/macroblock, which is pleasantly close to our original data. Furthermore, 
we calculated the autocorrelation 

C(t) _ £[A(QA(i + r)] - fi 1 _ _ 

0 ( 0 ) 0 ( 0 ) 

Although the values are not close for both sequences, they appear to have 
almost the same shape (Figures 10 and 11). It is nice to notice the model 
generates similar correlation structure as the H.261 output since vaiiance 
and covariance values usually dominate the queuing behavior. It should be 
noted that we are mostly interested in the correlation behavior for small 
lags, as these values are used in the queuing analysis. The 10 frame sequence 
from Susie generates a lot of motion, as the woman is shaking her head. 
For those frames without this much motion, quantization step won t go that 
high. It means the hidden Markov Model developed here for the high-motion 
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sequence probably is not suited for a still sequence. Adopting the idea from 
MMPP, we can develop another hidden Markov model with different mean 
and variance for motionless sequence. And then build another Markov chain 
to change the models (high/low motion) alternatively in the simulation. 
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Figure 1. 

Original image from the Susie sequence (frame 25) 
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Figure 2. 
Difference image 
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Figure 3. 

Reconstructed image with threshold = 3 (no coding) 
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Figure 4. 

Reconstructed image with threshold = 5 (no coding) 








Figure 7. 

Zig-zag scanning pattern 
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Figure 9. 

Reconstructed image with coding rate 0.24 bpp 



Figure 10. 


utocorrelation of H.261 Output 

i-3 








Figure 11. 


Autocorrelation of Model Output 

Autocorrelation x 10 



Appendix 
Test Images 



Figure 0.1: Clockwise from top left to bottom left 1) USC-Girl 2) Girl 3) Lady 4) 
House 
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An Edge Preserving Differential Image Coding 
Scheme 

Martin C. Rost and Khalid Sayood 

Abstract — Differential encoding techniques are fast and easy to im- 
plement. However, a major problem with the use of differential encod- 
ing for images is the rapid edge degradation encountered when using 
such systems. This makes differential encoding techniques of limited 
utility especially when coding medical or scientific images, where edge 
preservation is of utmost importance. We present a simple, easy to 
implement differential image coding system with excellent edge pres- 
ervation properties. The coding system can be used over variable rate 
channels which makes it especially attractive for use in the packet net- 
work environment. 


I. Introduction 

The transmission and storage of digital images requires an enor- 
mous expenditure of resources, necessitating the use of compres- 
sion techniques. These techniques include relatively low complex- 
ity predictive techniques such as adaptive differential pulse code 
modulation (ADPCM) and its variations, as well as relatively higher 
complexity techniques such as transform coding and vector quan- 
tization It], [2]. Most compression schemes were originally de- 
veloped for speech and their application to images is at times prob- 
lematic. This is especially true of the low complexity predictive 
techniques. A good example of this is the highly popular ADPCM 
scheme. Originally designed for speech [3|, it has been used with 
other sources with varying degrees of success. A major problem 
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with its use in image coding is the rapid degradation in quality 
whenever an edge is encountered. Edges are perceptually very im- 
portant, and therefore, their degradation can be perceptually very 
annoying. If the images under consideration contain medical or sci- 
entific data, the problem becomes even more important, as edges 
provide position information which may be crucial to the viewer. 
This poor edge reconstruction quality has been a major factor in 
preventing ADPCM from becoming as popular for image coding 
as it is for speech coding. While good edge reconstruction capa- 
bility is an important requirement for image coding schemes, an- 
other requirement that is gaining in importance with the prolifera- 
tion of packet switched networks is the ability to encode the image 
at different rates. In a packet switched network, the available chan- 
nel capacity is not a fixed quantity, but rather fluctuates as a func- 
tion of the load on the network. The compression scheme must, 
therefore, be capable of taking advantage of increased capacity 
when it becomes available while providing graceful degradation 
when the rate decreases to match decreased available capacity. 

In this paper we describe a DPCM-based coding scheme which 
has the desired properties listed above. It is a low complexity 
scheme with excellent edge preservation in the reconstructed im- 
age. It takes full advantage of the available channel capacity pro- 
viding lossless compression when sufficient capacity is available, 
and very graceful degradation when a reduction in rate is required. 

II. Notation and Problem Formulation 

The DPCM system consists of two main blocks, the quantizer 
and the predictor (see Fig. 1). The predictor uses the correlation 
between samples of the waveform s(k) to predict the next sample 
value. This predicted value is removed from the waveform at the 
transmitter and reintroduced at the receiver. The prediction error 
is quantized to one of a finite number of values which is coded and 
transmitted to the receiver and is denoted by e q (k). The difference 
between the prediction error and the quantized prediction error is 
called the quantization error or the quantization noise. If the chan- 
nel is error free, the reconstruction error at the receiver is simply 
the quantization error. To see this, note (Fig. 1) that the prediction 
error e(k) is given by 

e(k) = s(k) - p(k ) (1) 
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with time. The actual change can be accommodated by changing 
the stepsize and reducing the lossless encoder codebook size by the 
same amount. Several of the systems proposed above were simu- 
lated. The results of these simulations are presented in the next 
section. 


IV. Results 

Before we provide the results using images, let us examine the 
performance of the scheme when applied to a one-dimensional sig- 
nal containing a simulated edge. This signal was first encoded us- 
ing a five-level quantizer. The results are shown in Fig. 3(a). As 
can be seen, it takes a little while for the DPCM system to catch 
up. In an image this would cause a smearing of the edge. When 
the proposed system with the same parameters is used there is no 
such effect, as is clear from Fig. 3(b). The quantizer in this case 
went into the recursive mode twice, once at the leading and once 
at the trailing edge. To get an equivalent effect, a standard DPCM 
system would have to have a forty-level quantizer. To show that 
this performance is maintained when the system is used with two- 
dimensional images, two systems of the type described in the pre- 
vious section have been simulated. Both systems use the following 
two-dimensional fixed predictor \1\ : p(k) = 2/3 s(k - 1) + 2/3s(k 
- 256) - 1 /3s (k - 257). One of the systems contains the lossless 
encoder followed by a ranlength encoder while the other contains 
only the lossless encoder without the runlength encoder. The test 
images used were the USC GIRL image, and the USC COUPLE 
image. Both are 256 by 256 monochrome 8-b images and have been 
used often as test images. The objective performance measures were 
the peak signal-to-noise ratio (PSNR) and the mean absolute error 
(MAE) which are defined as follows: 

255 2 

PSNR= i0lO *'\(s(k))-sW> 

MAE = < | s(k) - J(*)| > 

where < * ) denotes the average value. 

Several initial test runs were performed using a different number 
of levels, different values of ay, and different values of A to get a 
feel for the optimum values of the various parameters (given x L and 
A, x H is automatically determined). We found that an appropriate 
way of selecting the value of x L was using the relationship 


where i xj is the largest integer less than or equal to and N is 
the size of the alphabet of the lossless coder. This provides a sym- 
metric codebook when the alphabet size is odd, and a codebook 
skewed to the positive side when the alphabet size is even. The 
zero value is always in the codebook. 

As the alphabet size is usually not a power of two, the binary' 
code for the output alphabet will be a variable length code. The 
use of variable length codes always bring up issues of robustness 
with respect to changing input statistics. With this in mind, the rate 
was calculated in two different ways. The first was to find the out- 
put entropy, and scale it up by the ratio of symbols transmitted to 
the number of pixels encoded. We call this rate the entropy rate, 
which is the minimum rate obtainable if we assume the output of 
the lossless encoder to be memoryless. While this assumption is 
not necessarily true, the entropy rate gives us an idea about the best 
we can do with a particular system. We also calculated the rate 
using a predetermined variable length code. This code was de- 
signed with no prior knowledge of the probabilities of the different 
letters. The only assumption was that the letters representing the 
inner levels of the quantizer were always more likely than the let 
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Fig. 3. Coding of simulated one-dimensional edge with (a) DPCM, (b) 
proposed system. 


ters representing the outer levels of the quantizer. The code tree 
used is shown in Fig. 4. Obviously, this will become highly inef- 
ficient in the case of small alphabet size and small A, as in this 
case, the outer levels x, and x H will occur quite frequently. This 
rate can be viewed as an upper bound on the achievable rate. 

The results for the system without the runlength encoder are 
shown in Tables I and II. Table I contains the results for the COU- 
PLE image, while Table II contains the results for the GIRL image. 
In the table R L denotes the entropy rate while Ru is the rate obtained 
using the Huffman code of Fig. 4. Recall that for image compres- 
sion schemes, systems with PSNR values of greater than 35 dB are 
perceptually almost identical. As can be seen from the PSNR val- 
ues in the tables there is very little degradation with rate, and in 
fact, if we use the 35-dB criterion, there is almost no degradation 
in image quality until the rate drops below 2 b/pixel. This can be 
verified by the reconstructed images shown in Fig. 5. Each picture 







in Fig. 5 consists of the original image, the reconstructed image 
and the error image magnified 10 fold. In each ot the pictures, it 
is extremely difficult to tell the source or original image from the 
reconstructed or output image. This subjective observation is sup- 
ported by the error images in each case which are uniform in tex- 
ture throughout without the edge artifacts which can be usually 
seen in the error images for most compression schemes. 

We can see from the results that if the value of A, and hence, x r 
is fixed, the size of the codebook has no effect on the performance 
measures. This is because the only effect of reducing the codebook 
size under these conditions is to increase the number of symbols 
transmitted. While this has the effect of increasing the rate, because 
of the way the system is constructed it does not influence the re- 
sulting distortion. The drop in rate for the same distortion as the 
alphabet size increases can be clearly seen from the results in Ta- 
bles I and II. 

Table III and Table IV show the decrease in rate when a simple 
runlength coder is used. The runlength coder encodes long strings 
of x L and x H using the special sequences mentioned previously. As 
can be seen from the results the improvement provided by the cur- 
rent runlength encoding scheme is significant only for small alpha- 
bets and small values of A. This is because it is under these con- 
ditions that most of the long strings of x ( and x H are generated. 


However, we are not as yet using many of the special sequences in 
the larger alphabet codebooks, so there is certainly room for im- 
provement. 

Finally to show the effect of changing rate on the perceptual 
quality, the USC GIRL image was encoded using three different 
rates. The top quarter of the image was encoded using a codebook 
size of eight and a A of two resulting in a rate of 4.37 b/pixel. 
The second quarter of the image was encoded using a codebook ot 
size five and a A of 4 resulting in a rate of 2.86 b/pixel. The 
bottom half of the image was encoded using a codebook size of 
three and A of eight resulting in a rate of 2.36 b/pixel. The original 
and reconstructed images are shown in Fig. 6. The fact that the 
image is coded with three different rates can only be noticed if the 
viewer is already aware of this fact and then only after very close 
scrutiny. The fact that the image was encoded using three different 
rates is clear in the magnified error image shown in Fig. 7. This 
property of the coding scheme would be extremely useful if changes 
in the transmission bandwidth forced the coder to operate at differ- 
ent rates. 

To see how this algorithm performs on a relative scale, we com- 
pare it to the differential scheme proposed by Maragos, Shafer, and 
Mersereau [8], The system proposed by Maragos el al. uses a for- 
ward adaptive two-dimensional predictor and a backward adaptive 
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Fig. 7. Error image for GIRL image coded at three different rates 
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of three. Given a five-level recursive quantizer, this corresponds to 
an alphabet size of 125, which would be somewhat excessive for a 
simple implementation. (In |8] block sizes of four to eight are used 

with two- and three-level quantizers.) 

The above comparison is not meant to indicate that the two sys- 
tems being compared are exclusive. A case can be made for com- 
bining the good features of both systems. For example, the predic- 
tion scheme described in [8] could be combined with the 
quantization scheme described here. However, it was felt in this 
particular case that the advantages to be gained by the addition ot 
a forward adaptive predictor were offset by the increase in com- 
plexity and synchronization requirements. 

V. Conclusion 

Wc have demonstrated a simple image coding scheme which is 
very easy to implement in real time and has excellent edge pres- 
ervation properties over a wide range of rates. 

This system would be especially useful in transmitting images 
over channels were the available bandwidth may be vary. The edge 
preserving quality is especially useful in the encoding of scientific 
and medical images. 


TABLE V 

Comparison of Proposed System with That of |8| 


Results from 181 

Results from 

{Frame Size = 32, 3 

Proposed System 

Level AQB) 

(Alphabet Size 5) 

Rate PSNR 

Rate PSNR 


0.74 30.3 

0.83 31.6 

0.93 32.6 

1.03 33.4 


0.74 

31.13 

0.84 

32.1 

0.94 

33.1 

1.03 

33.9 


quantizer. The coefficients are obtained over a 32 by 32 or a 16 by 
16 block and transmitted as side information. The proposed system 
(we feel) is considerably simpler, because of the lack of any need 
for adaptation and side information; however, the results compare 
favorably with the system of [8|. Comparative results arc shown in 
Table V. The results were obtained by varying the stepsize A until 
the rate obtained was similar to the rate in 18], and then comparing 
the PSNR. As in (81 , to obtain rates below 1 b/ pixel, several coder 
outputs were concatenated into blocks which were then Huffman 
encoded. For the results shown in Table V, we used a block size 
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